Lazily convert Prism/RBS locations to CodeRange objects by sinsoku · Pull Request #436 · ruby/typeprof

sinsoku · 2026-04-13T13:41:53Z

AST node initializers were eagerly calling code_range_from_node to
convert Prism::Location/RBS::Location into CodeRange, triggering
expensive UTF-16LE encoding via String#encode (9.0% of CPU samples
in tool/dog_bench.rb).

Store the raw location in @mid_code_range_loc / @cname_code_range_loc
and convert lazily on first access via ||=. In batch mode most values
are never accessed; in LSP mode the cost is deferred from file loading
to first interactive use (hover, go-to-definition, etc.).

Also remove mid_code_range from attrs hashes to avoid comparing
unconverted locations in Node#diff.

Measured with tool/dog_bench.rb (stackprof):

String#encode: 89 samples (9.0%) -> 19 samples (1.9%)

mame · 2026-04-28T12:44:04Z

I'm hoping that switching to UTF-8 in #442 will significantly cut down the time spent in String#encode. The CodeRange object itself still gets allocated, though. I wonder if that allocation overhead might be neglectable?

AST node initializers were eagerly calling `code_range_from_node` to convert `Prism::Location`/`RBS::Location` into `CodeRange`, triggering expensive UTF-16LE encoding via `String#encode` (9.0% of CPU samples in `tool/dog_bench.rb`). Store the raw location in `@mid_code_range_loc` / `@cname_code_range_loc` and convert lazily on first access via `||=`. In batch mode most values are never accessed; in LSP mode the cost is deferred from file loading to first interactive use (hover, go-to-definition, etc.). Also remove `mid_code_range` from `attrs` hashes to avoid comparing unconverted locations in `Node#diff`. Measured with `tool/dog_bench.rb` (stackprof): - `String#encode`: 89 samples (9.0%) -> 19 samples (1.9%)

sinsoku · 2026-05-04T10:24:02Z

Thanks for the review! I've rebased the branch onto master and force-pushed.

The default position_encoding is still UTF_16LE (env.rb:326), so String#encode is still called unless utf-8 is explicitly requested.

I ran tool/dog_bench.rb again on master (c824bb81, 5 runs each):

	master	this PR	delta
Elapsed (avg of 5)	1.402 s	1.238 s	−11.7%
`String#encode`	100 (8.4%)	29 (2.9%)	−71 samples
GC total	463 (39.0%)	364 (35.9%)	−99 samples
Total samples	1187	1015	−172

Two points:

Even on the default UTF-16LE path, String#encode still takes 8.4%. Lazy conversion brings it down to 2.9%.
The CodeRange allocation itself also matters — GC samples drop by ~99 (about 8% of total). This gain doesn't depend on position_encoding, so UTF-8 users benefit too.

So I think lazy conversion is still worth it, even after #442.

sinsoku force-pushed the lazy-convert-code-range branch from 08c15e6 to 61d3805 Compare May 4, 2026 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily convert Prism/RBS locations to CodeRange objects#436

Lazily convert Prism/RBS locations to CodeRange objects#436
sinsoku wants to merge 1 commit intoruby:masterfrom
sinsoku:lazy-convert-code-range

sinsoku commented Apr 13, 2026

Uh oh!

mame commented Apr 28, 2026

Uh oh!

sinsoku commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sinsoku commented Apr 13, 2026

Uh oh!

mame commented Apr 28, 2026

Uh oh!

sinsoku commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants